Ensembles of data-reduction-based classifiers for distributed learning from very large data sets

نویسندگان

R. A. Mollineda

J. M. Sotoca

J. S. Sánchez

چکیده

An ensemble of classifiers is a set of classification models and a method to combine their predictions into a joint decision. They were primarily devised to improve classification accuracies over individual classifiers. However, the growing need for learning from very large data sets has opened new application areas for this strategy. According to this approach, new ensembles of classifiers have been addressed to partition a large data set into possible disjoint moderate-sized (sub)sets, which are then distributed along with a number of individual classifiers across multiple processors for parallel operation. A second benefit of reducing sizes is to make feasible the use of well-known learning methods which are not appropriate for handling huge amount of data. This paper presents such a distributed model as a solution to the problem of learning on very large data sets. As individual classifiers, data-reduction-based techniques are proposed due to their abilities to reduce complexities and, in much cases, error rates.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

IRDDS: Instance reduction based on Distance-based decision surface

In instance-based learning, a training set is given to a classifier for classifying new instances. In practice, not all information in the training set is useful for classifiers. Therefore, it is convenient to discard irrelevant instances from the training set. This process is known as instance reduction, which is an important task for classifiers since through this process the time for classif...

متن کامل

Selected Prior Research

• 1996 scaled tree-based classifiers to very large data sets. A fundamental challenge in data mining is to mine data sets that are so large that they do not fit into a computer’s memory. This is important for a wide variety of applications ranging from homeland defense to identifying fraudulent credit card transactions. One of the most accurate techniques in data mining is tree-based classifier...

متن کامل

MMDT: Multi-Objective Memetic Rule Learning from Decision Tree

In this article, a Multi-Objective Memetic Algorithm (MA) for rule learning is proposed. Prediction accuracy and interpretation are two measures that conflict with each other. In this approach, we consider accuracy and interpretation of rules sets. Additionally, individual classifiers face other problems such as huge sizes, high dimensionality and imbalance classes’ distribution data sets. This...

متن کامل

Optimal ensemble construction via meta-evolutionary ensembles

In this paper we propose a meta-evolutionary approach to improve on the performance of individual classifiers. In the proposed system, individual classifiers evolve, competing to correctly classify test points, and are given extra rewards for getting difficult points right. Ensembles consisting of multiple classifiers also compete for member classifiers, and are rewarded based on their predicti...

متن کامل

SVM Ensembles Are Better When Different Kernel Types Are Combined

Support Vector Machines (SVM) are strong classifiers, but large data sets might lead to prohibitively long computation times and high memory requirements. SVM ensembles, where each single SVM sees only a fraction of the data, can be an approach to overcome this barrier. In continuation of related work in this field we construct SVM ensembles with Bagging and Boosting. As a new idea we analyze S...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2004

Ensembles of data-reduction-based classifiers for distributed learning from very large data sets

نویسندگان

چکیده

منابع مشابه

IRDDS: Instance reduction based on Distance-based decision surface

Selected Prior Research

MMDT: Multi-Objective Memetic Rule Learning from Decision Tree

Optimal ensemble construction via meta-evolutionary ensembles

SVM Ensembles Are Better When Different Kernel Types Are Combined

عنوان ژورنال:

اشتراک گذاری